Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for listing voices and languages #7

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

panaC
Copy link
Member

@panaC panaC commented Aug 13, 2024

No description provided.

@panaC panaC self-assigned this Aug 13, 2024
@HadrienGardeur
Copy link
Contributor

You can convert this to a draft PR and move it back to a PR once it's ready rather than putting [WIP] as a prefix.

@panaC panaC marked this pull request as draft August 13, 2024 17:29
@HadrienGardeur HadrienGardeur changed the title [WIP] Readium Speech JS library development 🙈 Initial support for listing voices and languages Aug 14, 2024

return new Promise((resolve, _reject) => {

let counter = 1000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the reason that you need to check the voices every 10ms 1000 times? I imagine it's a hack to deal with browsers, but why those values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's 1000 times at the worst condition, which end with the return of an empty array.
In Firefox and Safari Speech API works at start both in a script in the header deferred or not
In Chrome Speech API is available just before the fire of the 'DOMContentLoaded' event in a deferred script
In Edge the more slowness browser Speech API is available in about 60ms in my computer.

So more or less a counter between 10 to 100 is better than 1000 that could be too long.

Do you think to an alternative to catch the result of getVoices() ?

script/extract-json.mjs Outdated Show resolved Hide resolved
@@ -0,0 +1,26 @@


Copy link
Member

@chocolatkey chocolatkey Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about in a future version is to come up with a way of compressing this data. Adding all these strings to a JS bundle is a significant size increase (est. ~60KB based on minified data.js). This could be done with, for example, the CompressionStream API and/or a more efficient encoding. Let me know your thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Thorium / navigator in some cases we pass lots of textual data (serialized JSON) via URL query parameters to guarantee that the webview / iframe resolves the data "synchronously" (instant access). We use code that looks like this (there is some usage of NodeJS / Electron Buffer API but nothing that couldn't be pure Web API):

const json = { /* LOTS OF DATA */ };
const jsonStr = JSON.stringify(json);
const cs = new CompressionStream("gzip");
const csWriter = cs.writable.getWriter();
csWriter.write(new TextEncoder().encode(jsonStr));
csWriter.close();
const buff = Buffer.from(await new Response(cs.readable).arrayBuffer());
const b64 = buff.toString("base64");
// pass b64 as escaped URL query parameter

The inverse operation:

const buff = Buffer.from(b64, "base64");
const cs = new DecompressionStream("gzip");
const csWriter = cs.writable.getWriter();
csWriter.write(buff);
csWriter.close();
const buffer = Buffer.from(await new Response(cs.readable).arrayBuffer());
const jsonStr = new TextDecoder().decode(buffer);
const json = JSON.parse(jsonStr);

@HadrienGardeur
Copy link
Contributor

Based on this latest commit, I've identified a bug with Finnish and Filipino.

In Edge, while testing the demo I noticed that the Filipino voices are listed under Finnish > Philippines and that Filipino is not listed in the list of languages.
Finnish voices use fi-FI while Filipino voices use fil-PH. It seems that three letter codes for languages are not properly and could end up grouped with two letter codes that start with the same letters.

@HadrienGardeur
Copy link
Contributor

I've also inspected the output in Edge and noticed that the Microsoft Natural Voices have an incorrect value for pitchControl;

This value seems to be skipped by the code importing the JSON data.

build/src/voices.js Outdated Show resolved Hide resolved
@panaC
Copy link
Member Author

panaC commented Aug 16, 2024

I've also inspected the output in Edge and noticed that the Microsoft Natural Voices have an incorrect value for pitchControl;

This value seems to be skipped by the code importing the JSON data.

I don't understand where you set the pitchControl value for the voice, even in the demo mode there is no pitchControl setting in SpeechSynthesisUtterance. Did I miss something ?

@HadrienGardeur
Copy link
Contributor

On Android I've noticed that I cannot select some voices available in the demo. All languages supported officially work fine, but if I select "Chinese" for example, the list of voices is never loaded.

Here's a list of languages that seem affected:

  • yue
  • su
  • Serbian
  • sd
  • sat
  • Pendjabi
  • mni
  • ks
  • Hindi
  • bs

As you can notice, quite a lot of languages also seem to have missing translations in Intl.DisplayNames, although their voice names are properly translated by Google.

Instead of "hello world", I think that we should just default to an empty string when there's no test utterance.

@panaC
Copy link
Member Author

panaC commented Aug 23, 2024

Fixes #6

  • Sort by Language/region
  • Sort by gender
  • Sort by name
  • Sort by quality

Fixes #5

  • group by language
  • group by regions
  • group by kind of voices

Fixes #4

  • filter on offline availability
  • filter on gender
  • filter on quality
  • filter on novelty
  • filter on veryLowQuality
  • filter on recommended

Fixes #2
Fixes #1

function getVoices(): Promise<IVoices[]>

export interface IVoices {
    label: string;
    voiceURI: string;
    name: string;
    language: string;
    gender?: TGender | undefined;
    age?: string | undefined;
    offlineAvailability: boolean;
    quality?: TQuality | undefined;
    pitchControl: boolean;
    recommendedPitch?: number | undefined;
    recommendedRate?: number | undefined;
}

export const defaultRegion = ${JSON.stringify(defaultRegion)};
`;

const filePath = './src/data.ts';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This source file is script-generated, may I suggest:

  1. adding a comment header at the top of the file that explains which script produced the Typescript code, and at what date/time (or ideally: git revision / commit hash).
  2. locate the generated file inside a "gen" subfolder or the "src" source tree, or to rename the file e.g. src/data.gen.ts

if (Array.isArray(voices) && voices.length) return resolve(voices);
setTimeout(tick, 10);
}
setTimeout(tick, 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setTimeout() has a resolution of about 20ms, last time I checked. Also, IIRC this is affected by browser window visibility (it doesn't really matter for this use-case though)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I think that a 200ms tick would be fine for this.

function filterOnGender(voices: IVoices[], gender: TGender): IVoices[]

function filterOnGender(voices: IVoices[], gender: TGender): IVoices[]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate

Suggested change
function filterOnGender(voices: IVoices[], gender: TGender): IVoices[]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

6 participants